Coordinate-based implicit neural networks, or neural fields, have emerged as useful representations of shape and appearance in 3D computer vision. Despite advances however, it remains challenging to build neural fields for categories of objects without datasets like ShapeNet that provide canonicalized object instances that are consistently aligned for their 3D position and orientation (pose). We present Canonical Field Network (CaFi-Net), a self-supervised method to canonicalize the 3D pose of instances from an object category represented as neural fields, specifically neural radiance fields (NeRFs). CaFi-Net directly learns from continuous and noisy radiance fields using a Siamese network architecture that is designed to extract equivariant field features for category-level canonicalization. During inference, our method takes pre-trained neural radiance fields of novel object instances at arbitrary 3D pose, and estimates a canonical field with consistent 3D pose across the entire category. Extensive experiments on a new dataset of 1300 NeRF models across 13 object categories show that our method matches or exceeds the performance of 3D point cloud-based methods.
translated by 谷歌翻译
Recent work has shown that fine-tuning large pre-trained language models on a collection of tasks described via instructions, a.k.a. instruction-tuning, improves their zero and few-shot generalization to unseen tasks. However, there is a limited understanding of the performance trade-offs of different decisions made during the instruction-tuning process. These decisions include the scale and diversity of the instruction-tuning benchmark, different task sampling strategies, fine-tuning with and without demonstrations, training using specialized datasets for reasoning and dialogue, and finally, the fine-tuning objectives themselves. In this paper, we characterize the effect of instruction-tuning decisions on downstream task performance when scaling both model and benchmark sizes. To this end, we create OPT-IML Bench: a large benchmark for Instruction Meta-Learning (IML) of 2000 NLP tasks consolidated into task categories from 8 existing benchmarks, and prepare an evaluation framework to measure three types of model generalizations: to tasks from fully held-out categories, to held-out tasks from seen categories, and to held-out instances from seen tasks. Through the lens of this framework, we first present insights about instruction-tuning decisions as applied to OPT-30B and further exploit these insights to train OPT-IML 30B and 175B, which are instruction-tuned versions of OPT. OPT-IML demonstrates all three generalization abilities at both scales on four different evaluation benchmarks with diverse tasks and input formats -- PromptSource, FLAN, Super-NaturalInstructions, and UnifiedSKG. Not only does it significantly outperform OPT on all benchmarks but is also highly competitive with existing models fine-tuned on each specific benchmark. We release OPT-IML at both scales, together with the OPT-IML Bench evaluation framework.
translated by 谷歌翻译
大型语言模型经常经过数十万个计算天的训练,已经显示出零和少数学习的显着功能。鉴于它们的计算成本,如果没有大量资本,这些模型很难复制。对于通过API可用的少数产品,没有访问完整的模型权重,因此很难学习。我们提供开放训练的预训练变压器(OPT),这是一套仅解码器预训练的变压器,范围从12500万到175b参数,我们旨在与感兴趣的研究人员完全和负责任地分享。我们表明,OPT-175B与GPT-3相当,而仅需要1/7碳足迹才能开发。我们还释放了日志,详细介绍了我们面临的基础架构挑战,以及用于尝试所有发布模型的代码。
translated by 谷歌翻译
半监督学习方法已成为对打击获得大量注释数据的挑战的活跃研究领域。为了提高半监督学习方法表现的目标,我们提出了一种新颖的框架,Hiematch,一种半监督方法,利用分层信息来降低标签成本并表现以及vanilla半监督学习方法。分层信息通常是具有细粒标签的粗标签(例如,啄木鸟)的粗标签(例如,啄木鸟)的现有知识(例如,柔软的啄木鸟或金朝啄木鸟)。但是,尚未探讨使用使用粗类标签来改进半监督技术的监督。在没有细粒度的标签的情况下,Himatch利用标签层次结构,并使用粗级标签作为弱监控信号。此外,Himatch是一种改进任何半熟的学习框架的通用方法,我们使用我们的结果在最近的最先进的技术Mixmatch和Fixmatch上展示了这一点。我们评估了在两个基准数据集,即CiFar-100和Nabirds上的Himatch疗效。与MixMatch相比,HOMACHACT可以在CIFAR-100上减少50%的粒度标签50%的用量,仅在前1个精度的边缘下降0.59%。代码:https://github.com/07agarg/hiermatch.
translated by 谷歌翻译
尽管最近的进步,但是,尽管最近的进展,但是从单个图像中的人类姿势的全3D估计仍然是一个具有挑战性的任务。在本文中,我们探讨了关于场景几何体的强先前信息的假设可用于提高姿态估计精度。为了主弱地解决这个问题,我们已经组装了一种新的$ \ textbf {几何姿势提供} $ DataSet,包括与各种丰富的3D环境交互的人员的多视图图像。我们利用商业运动捕获系统来收集场景本身的姿势和构造精确的几何3D CAD模型的金标估计。要将对现有框架的现有框架注入图像的现有框架,我们介绍了一种新颖的,基于视图的场景几何形状,一个$ \ textbf {多层深度图} $,它采用了多次射线跟踪到简明地编码沿着每种相机视图光线方向的多个表面入口和退出点。我们提出了两种不同的机制,用于集成多层深度信息姿势估计:输入作为升降2D姿势的编码光线特征,其次是促进学习模型以支持几何一致姿态估计的可差异损失。我们通过实验展示这些技术可以提高3D姿势估计的准确性,特别是在遮挡和复杂场景几何形状的存在中。
translated by 谷歌翻译